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CRYPTOGRAPHY-BASED LOW DISTORTION ROBUST DATA AUTHENTICATION 

SYSTEM AND METHOD THEREFOR 

BACKGROUND OF THE INVENTION 
Field of the Invention 

5 The present invention generally relates to authentication of data such as an image or video 

which survive incidental modifications to the data content caused by, for example, noise, lossy 
compression-decompression, or digital-to-analog-to-digital (D/A/D) conversion of the data file, 
which do not affect the authenticity of the file. 

Description of the Related Art 

10 In a world where electronic multimedia data such as images and video data are 

transferred and modified routinely, authentication of data becomes important in verifying the 
integrity of the data. In these applications, data being authentic includes the notions that the data 
has not been tampered with, or that it came from the right owner (i.e., the origin of the data can 
be verified). One of the requirements in an authentication system for multimedia data such as 

15 images, video and sound is that the data survives incidental modifications such as lossy 
compression-decompression, noise, printing and scanning, or digital-to-analog-to-digital 
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conversion while retaining its authenticity. On the other hand, malicious modifications should 
render the data inauthentic. Such authentication systems are called robust authentication 
systems. 

Almost all authentication systems proposed have the following general form. That is, 

5 some essential data is extracted from the source data, from which an authentication tag is created. 
The authentication tag is appended or inserted into the source data. The result is called 
authenticatable data. As the authentication tag is generally much smaller than the source, as 
some data reduction occurs in generating the tag. In some robust authentication systems, to 
enable authentication, the authenticatable data is distorted from the source data. This distortion 

10 is referred to as authenticatibility distortion. 

To authenticate the authenticatable data, the appended (or inserted) authentication tag is 
extracted from the data. Next, the essential data is extracted from the data from which a second 
authentication tag is created. These two authentication tags are then compared. If they compare 
favorably, then the image is deemed authentic. 

1 5 Most of the conventional robust authentication schemes can be classified into two classes. 

The main difference between the two classes lies in the way data reduction is performed. 

The first class performs data reduction by extracting some relevant features (such as the 
edges in the image) from the data and uses them in the authentication tag (e.g., see 
"Content-based integrity protection of digital images", Maria Paula Queluz, Proceedings of SPIE, 

20 vol. 3657, 85-93, 1999; "Compression Tolerant Image Authentication", Sushil Bhattacharjee and 
Martin Kutter, Proc. ICIP 1998; and commonly-assigned U. S. Patent Application No. 
09/398,203 entitled "Semi-fragile Watermarks" filed on September 17, 1999 to Martens et al.). 
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In these systems, small changes in the image result in small changes in the tag. 
Furthermore, as authenticity is based on similarity between the two tags, small differences 
between the two tags do not destroy the authenticity of the file. There is little or no 
authenticability distortion. 

5 However, a drawback of this type of authentication scheme is that, because small changes 

in the image result in small changes in the tag, it is potentially easy to find forged images which 
generate the same or similar tags as the original image. For example, as pointed out in 
"Distortion Bounded Authentication Techniques", Nasir Memon, Poorvi Vora, Boon-Lock Yeo 
and Minerva Yeung, Proceedings of the SPIE, vol. 3971, pg. 164-174, 2000, many images have 

10 the same set of edges, yet the content of the images are different (e.g., an image of a coffee stain 
versus a blood stain). In the language of cryptography, the function which computes the tag from 
the original image is not pre-image resistant. 

A second type of authentication scheme utilizes a cryptographic hash function to reduce 
the data and generate a relatively small tag from the image. In this case, the two tags must be 

15 identical to ensure authenticity. The reader is referred to, for example, the aforementioned paper 
by Memon et al. It is noted that cryptographic hashes have the property that small changes in the 
image result in large changes in the tag and the use of a cryptographic hash function makes it 
extremely difficult to generate forged images that have the same tag as the original image. 

However, these methods modify the source image significantly in order for the image to 

20 be authenticatable (i.e., there is a significant amount of authenticability distortion). For example, 
in the paper by Memon et al, the pixels of the image are quantized and the quantized image is 
made authenticatable. The amount of authenticability distortion applied to the image can be as 
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large as the maximum amount of modification to the image that the authentication system is 
willing to tolerate before the image is deemed inauthentic. This is not acceptable in cases where 
the authenticatable images must be of high quality, whereas images of a lesser quality can be 
considered authentic. This is especially true when the images are printed on paper and 
5 authentication is done by scanning the printed image. In an application such as the "digital 
notary" which will be presented below, the authentication distortion must be zero. 



SUMMARY OF THE INVENTION 

In view of the foregoing and other problems of the conventional methods and systems, an 
object of the present invention is to provide a robust authentication system (and method) that 

1 0 survives minor modifications to the data which combines the advantages of the two classes of 
robust authentication systems discussed above. 

In a first aspect, a method for generating an output file from a source file where benign 
modifications to a content of the output file still render the output file authentic, includes 
constructing an index vector from the source file, quantizing the source file, generating an 

1 5 authentication mark from the quantized source file and the index vector, generating an 

authentication tag by appending the index vector to the authentication mark, and generating the 
output file by appending the authentication tag to the source file. 
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Preferably, Hie inventive system (and method) modifies the data little or not at all in 
generating the authenticatable data by minimizing or reducing the authenticability distortion yet 
utilizes digital signatures and cryptographic hash functions to make forgery attacks difficult. 
To make the data I authenticatable, first an n-dimensional vector V corresponding to 
5 some essential features of data I is constructed from data L This vector is referred to as the 
feature vector of the data. It is preferable that the function which computes V from I is smooth. 
Furthermore, it is preferable that this function is invertible or nearly invertible to avoid the 
problems of the first class of algorithms discussed earlier. Some examples of feature vectors in 
the case of images include properly scaled, possibly quantized, Discrete Cosine Transform 
1 0 (DCT) coefficients or properly scaled, possibly quantized, Discrete Fourier Transform (DFT) 

magnitude coefficients. It is desirable to have V be a real n-dimensional vector in an appropriate 
space where distances correspond roughly to perceptual differences or some metric which 
indicates the amount of malicious modifications. 

For each of the n components of V, a quantization function is chosen from a 
1 5 predetermined set of quantization functions. The quantization function is chosen to have a small 
quantization error with respect to this component. The information about which quantization 
functions are chosen is stored in the index vector X. The feature vector V is quantized by these 
quantization functions, the quantized feature vector and the index vector X are signed jointly by a 
digital signature algorithm and the resulting signature along with a losslessly compressed form of 
20 X form the authentication tag T. 

Next a modified data P is made from data I. For example, in the case of images, Y could 
be obtained from I by lossy compression such as Joint Pictures Experts Group (JPEG) format 
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processing. A general text on the JPEG compression standard is "JPEG: still image data 
compression standard" Pennebaker and Mitchell, Von Nostrand Reinhold, 1993. F can depend 
on the feature vector V. The difference between F and I is the authenticability distortion. 

In practical implementations, this distortion is preferably made to be minimal. In some 
5 embodiments where this distortion is desired to be zero, F is set equal to L 

Then, this authentication tag T is appended or inserted into F resulting in authenticatible 

data. 

To authenticate a dataset, the authentication tag T is first extracted from the dataset. 
Then, the index vector X is extracted from T by removing the signature S and decompressing the 
10 remainder. 

Using X, a set of quantization functions is found. Then, the feature vector V is 
constructed from the dataset, quantized using the set of quantization functions corresponding to 
X, and the signature S in T is verified as to whether it corresponds to signing the quantized V and 
X jointly. If so, the data is authentic. Otherwise, the data is not authentic. 

1 5 With the invention, forgeries are prevented (or made extremely difficult) by the use of 

cryptographic hash functions since it is difficult to find forged images which generate the same 
or similar tags as the original image. Hence, a much more secure system and method are 
provided unlike the first type of conventional scheme. Further, because of the use of more than 
one quantization function, the inventive method and system do not modify the source image 

20 significantly in order for the image to be authenticatable, thereby overcoming the problems of the 
second class of conventional schemes. Thus, there is not a significant amount of authenticability 
distortion. 
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As describe^ below, the inventive method and system allow various parameters such as 
the length of the authentication tag or the maximum amount of modification tolerated to be 
traded off in a gradual manner against the amount of authenticability distortion. 



BRIEF DESCRIPTION OF THE DRAWINGS 



5 The foregoing and other objects, aspects and advantages will be better understood from 

the following detailed description of a preferred embodiment of the invention with reference to 
the drawings, in which: 

Figures 1 A- IB are flow diagrams illustrating the steps of a general authentication scheme 
including generation of authenticatable data and authentication of data, respectively; 
1 0 Figures 2 A-2B are flow diagrams of the authentication scheme of the present invention 

including generation of authenticatable data and authentication of data, respectively; 

Figures 3 A-3B are flow diagrams of a preferred embodiment of the present invention 
including generation of authenticatable data and authentication of data, respectively; 

Figure 4 is a diagram of the quantization functions used in a preferred embodiment of the 
15 present invention; 

Figures 5 A-5C show how the bits in X are ordered in a preferred embodiment for an 
example image with 4 blocks, with Figure 5A being for a grayscale image, Figure 5B being for a 
color image, and Figure 5C being another ordering of the bits in X for a color image; 
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Figure 6 shows how the components of the feature vector V should be distorted in a 
modification of the preferred embodiment of the present invention, to trade off authenticability 
distortion against the size of the authentication tag; 

Figure 7 shows how the components of feature vector V should be distorted in a 
5 modification of the preferred embodiment to trade off authenticability distortion against the 
amount of modification tolerated before the image becomes inauthentic; 

Figure 8 is a flow diagram of the generation of authenticatable data in a preferred 
embodiment of the present invention where authenticability distortion is applied; 

Figure 9 shows an application of the proposed invention to generate authenticatable 
1 0 printed documents; 

Figure 10 shows an application of the proposed invention to digitally notarize original 
printed or handwritten documents; 

Figure 1 1 illustrates an exemplary information handling/computer system for use with the 
present invention; and 

1 5 Figure 12 illustrates a storage medium 1200 for storing steps of the program for the 

method according to the present invention. 
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DETAILED DESCRIPTION OF A PREFERRED 
EMBODIMENT OF THE INVENTION 



Referring now to Figures 1 A- IB, there is shown a diagram of a general data 
authentication system. 

5 Figure 1 A shows the process of generation of authenticatable data and Figure IB shows 

the authentication process. A source data (e.g., image, video, etc.) I is fed into data reduction and 
tag generation device 102 which reduces the data and generates an authentication tag T. In data 
preprocessor 101, an authenticatibility distortion is applied to the source data I, thereby resulting 
in data set I\ Then, the authentication tag T is combined with the data set F in 103 to generate 
1 0 authenticatable data I a . 

To authenticate I a , as shown in Figure IB, the authentication tag T is extracted from I a 
(104) and T is used to check whether I a is authentic (105). 

Referring now to Figures 2A-2B, there is shown a diagram of the preferred embodiment 
of the present invention. 

1 5 Figure 2A shows the process of generation of authenticatable data and Figure 2b shows 

the authentication process according to the present invention. 

Authenticatable Data Generation 

First, a series of quantization functions q(j) are fixed (selected) in advance (e.g., five 
functions are selected in a set). The quantization functions can be considered as a data reduction 
20 function such that a data set is taken (e.g., 1 6 bits) and a smaller data set is generated (e.g., 1 -bit). 
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In general, a quantisation function can be any function which is not one-to-one. In practice, a 
quantization function maps well defined or connected regions of points to a single point. The 
quantization functions are selected such that at any point selected in a space, there is at least one 
of these quantization functions in which the point in the space is in the middle of the set such that 
5 if movement (distortion) of the image is made, then the point will map to the same thing. 

Then, the feature vector V is computed from the source data I in 202. Such a vector V 
can be computed from a source file by an algorithm or be set to be equal to I. 

For each component Vj of V, a quantization function qOO is chosen in 203. q(ji) is chosen 
such that the quantization of Vi using q(ji) results in a predetermined small amount (or the least 
10 amount) of quantization error. That is, as noted above, once the quantization functions are 

selected, given any point, the function selected will be the one which gives the smallest error. It 
is noted that the invention will still be operable even if the quantization function selected is not 
the one providing the smallest error. However, there may be lower performance. 

The indices ji are stored in the index vector X. Then, feature vector V is quantized 
15 according to q(J0 (204). Then, index vector X is appended to the quantized V resulting in W 
(206). 

A digital signature algorithm (207) is applied to W resulting in a signature S. Index 
vector X is then compressed (205) with a lossless compression algorithm and appended to S 
(208), thereby resulting in an authentication tag T. Authenticability distortion is applied to I 
20 resulting in F (201). 
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Then ? T is appended or inserted into F, thereby resulting in authenticatable data I a (209). 
(It is noted that steps 201, 202, and 205 are optional to the method of the invention, but are 
preferably performed.). 



Data Authentication 

5 Referring to Figure 2B, to authenticate I a5 T is first extracted (210). Then, the signature S 

is removed from T (21 1). The remaining portion of T is a compressed index vector (212). Then, 
this compressed index vector is decompressed, thereby to obtain the index vector X (213), and a 
feature vector V is computed from I a (214). 

Using the indices ji in X, the components of V are quantized using q(ji) (215). X is 
10 appended to the quantized V, thereby resulting in W (216). Then, W is verified using the 
corresponding signature verification algorithm and the signature S (217). If the signature S 
verifies with W, the data is authentic. Otherwise, it is not authentic. 

In an alternative implementation, the data set W in both the generation of the tag and in 
the authentication phase is generated by appending the compressed form of X to the quantized V. 

15 IMAGE DATA SET 

Referring now to Figures 3 A-3B, a preferred embodiment of the invention is shown for 
when the data set is an image and these Figures illustrate a special implementation of the 
invention. 

That is, Figure 3A shows the process of generation of authenticatable data and Figure 3B 
20 shows the authentication process for when the feature being used/examined is a discrete cosine 
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transform (DCT) coefficient described in further detail below. The case of a grayscale image will 
be considered first. 

First, the image is separated into 8x8 pixel blocks (301). When (if) the image cannot be 
partitioned into 8x8 pixel blocks, rows and columns of zeros are added to the image until it can. 
5 Another method of adding rows and columns of pixels is by reflecting pixels along the image 
boundaries. 

For each block, a 2-dimensional Discrete Cosine Transform (DCT) is applied. Then, 
each DCT coefficient is scaled by dividing it by a corresponding scaling value (302). Examples 
of tables of such scaling values are given in Tables 4-1 and 4-2 of the aforementioned book by 
1 0 Pennebaker and Mitchell. 

Next, for each of the resulting scaled DCT coefficients, one of two quantization functions 
(i.e., qO or ql) is chosen (303). The two quantization functions qO and ql, are shown in Figure 4. 
qO and ql can be expressed as 

q0(x) = round(x) 
15 ql (x) = round(x+0.5)-0.5 

where round(x) is the integer closest to x. 

That is, Figure 4 shows the quantization functions used in the preferred embodiment of 
the present invention. In the X axis of Figure 4 is the input of the data (e.g., 16-bit data, a real 
number, etc.) and the Y-axis is the output. In both qO and ql, a range of inputs is mapped to the 
20 same value on the Y-axis. Thus, for a range of data, input would be received and a same number 
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would be output. This is the data reduction which allows the image to tolerate some minor 
modification. 

The range of a quantization function is called the quantized values. In particular, the 
quantized values of qO are the integers {..., 0, 1, 2, 3, ...} = Q0 while the quantized values of ql is 
5 {...,0.5, 1.5, 2.5, 3.5, ...} = Ql. The quantization function q chosen is the one which minimizes 
the quantization error (i.e., if x is the DCT coefficient, then choose q such that |q(x)-x| is 
minimal). Another method to choose the quantization function is to choose the quantization 
function qt where t = argmin d(Qi 9 x) 

and d(Qi,x) denotes the distance from x to the set Qi in the space of real numbers. In the case of 
10 qO and ql as described above, these two methods give the same result In case of a tie (e.g., 
|ql(x)-x| = |q0(x)-x|), a quantization function (e.g., either of qO or ql) is randomly chosen. 

For each DCT coefficient, a single bit of the index vector X is assigned to determine 
which of the two quantization function is chosen (i.e., a "0" bit is assigned if qO is chosen and a 
"1" bit is assigned if ql is chosen). These bits form the index vector X (303). Thus, there are as 
1 5 many bits in X as there are pixels in the image. 

For a color image, the feature vector V is derived from the DCT coefficients of 8 by 8 
blocks in all the three color planes. In this case, the number of bits in X is three times the 
number of pixels in the image. 

Then, the DCT coefficients are quantized according to the chosen quantization functions 
20 (304). The function ql'(x) = round(x+0.5) can also be used instead of ql (x) = round(x+0.5)-0.5 
in generating the quantized DCT coefficients. This insures that all the quantized DCT 
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coefficients are integers. Then, X is appended to the quantized DCT coefficients, thereby to form 
W (306). 

Then, W is signed by a digital signature algorithm such as a DSA (digital signature 
algorithm) (307), thereby resulting in a signature S. Examples of digital signature algorithms can 
5 be found in "Handbook of Applied Cryptography", Menezes, van Oorschot and Vanstone, CRC 
Press, 1997. Practical digital signature algorithms typically include a cryptographic hash 
function to reduce the data and generate a relatively small signature. 

Then, the index vector X is compressed using a lossless compression algorithm such as 
Huffman encoding or Lempel-Ziv- Welch (LZW) encoding (305). A useful textbook on 
1 0 compression algorithms is "Introduction to data compression", Khalid Sayood, Morgan 
Kaufmann Publishers, Inc., 1996. 

In the preferred embodiment, the bits which form X are ordered as follows to facilitate 
compression of X. Consider the ordering of the DCT coefficients in each block as described in 
Figure 10-5 in the text by Pennebaker and Mitchell. 
1 5 Figures 5A-5C show how the bits in X are ordered in a preferred embodiment for an 

example image with 4 blocks, with Figure 5 A being for a grayscale image, Figure 5B being for a 
color image, and Figure 5C being another ordering of the bits in X for a color image. 

First, the bits corresponding to the first DCT coefficient in each block are collected, then 
follows the bits corresponding to the second DCT coefficient in each block, etc., as illustrated in 
20 Figure 5A. 

If a color image is considered, first the bits corresponding to the first 8 DCT coefficients 
of the first color dimension (i.e., R in RGB space, L in LAB space, C in CMY space, etc. 
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depending upon thp color space) in each block are collected, then followed by the bits 
corresponding to the first 8 DCT coefficients of the second color dimension in each block, etc., 
as illustrated in Figure 5B. 

Figure 5C illustrates another ordering for the bits of index vector X in the case of a color 

5 image and simply shows a modification of what is shown in Figure 5B in ordering the bits. First, 
the bits corresponding to the first DCT coefficient of the first color dimension (i.e., R in RGB 
space, L in LAB space, etc.) in each block are collected, then followed by the bits corresponding 
to the first DCT coefficient of the second color dimension in each block, etc. 

Since it is known exactly how many bits are in X (e.g., it equals the number of 8 x 8 

1 0 blocks in a grayscale image), the trailing zeros in X can be removed before compression (305). 
In the authentication phase, again it is known how many bits are in X, so X is retrieved by 
decompression and adding the right amount of trailing zeros (314). It is noted that the steps of 
adding and removing trailing zeros are optional. 

The compressed form of X is appended to the signature S to form an authentication tag T 

15 (308). Then, this authentication tag is appended onto or inserted into the image I (309). 

The tag T can be appended onto I by writing it into the comment field of the image 
format. Image formats which support such fields include JPEG and Tag Image File Format 
(TIFF). For example, the tag T can be appended onto I by writing T into the "COM" (Comment) 
marker segment or the Image Description Tag when the JPEG image format or the TIFF image 

20 format are used, respectively. The tag T can also be inserted into I by a robust data hiding 
scheme. Examples of robust data hiding schemes can be found in "Improving data hiding by 
using convolutional codes and soft-decision decoding" J. R. Hernandez, J-F Delaigle and B. 
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Macq, Proc. SPIE, ,vol. 3971, pg. 24-47, 2000 and "Preprocessed and postprocessed quantization 
index modulation methods for digital watermarking", B. Chen and G. W. Wornell, Proc. SPIE, 
vol. 3971, pp. 48-59, 2000. The robust data hiding schemes should be robust enough such that 
the tag T can be recovered from the image exactly even under minor modifications to the image. 
5 To authenticate an authenticatable image, the authentication tag T is extracted (310). 

After the signature S is removed from T (312), the remainder of T forms the compressed index 
vector (313). This is decompressed and trailing zeros are added to obtain X (314). 

Then, the image is decomposed into 8x8 blocks (311), and a DCT operation is applied to 
each block and scaled by dividing the DCT coefficients by scaling values (315). Then, the DCT 

1 0 coefficients are quantized according to the quantization functions given by the bits in X (3 1 6). 
Then, X is appended to the resulting quantized DCT coefficients (317), and the result is verified 
with the signature S by the corresponding signature verification algorithm (3 1 8). If it is verified, 
then the image is authentic. Otherwise, the image is deemed to be not authentic. 

The use of a digital signature algorithm in 207, 217, 307, and 318 can be replaced with 

1 5 message authentication codes or modification detection codes, depending on the type of 
application. For a complete discussion of such codes, the reader is referred to the text by 
Menezes et al. mentioned above. 

In the above preferred embodiment, when the authentication tag T is appended to the 
image by writing into the comment field of the image format, there is no authenticability 

20 distortion. Thus, in the flow diagram of Figure 2 A, V = I. When there is an invertible 

transformation between the image I and the feature vector V, there are two modifications to the 
preferred embodiment which allows the present invention to trade off authenticability distortion 
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against some other, parameter of the system. Essentially , these two modifications to the preferred 
embodiment include a step to apply authenticabiiity distortion (201) and they create V * I. 

In the first modification, it allows the invention to trade off the authenticabiiity distortion 
against the size of the authentication tag. 

In the second modification, it allows the invention to trade off the authenticabiiity 
distortion against the amount of modification the image can tolerate before it is deemed not 
authentic. 

In the first modification, the image is distorted as follows. Without loss of generality, 
assume that for the given feature vector V, the number of zeros in the bits of X is larger than the 
number of ones. For the components Xj of the feature vector V which are closer towards the 
quantized values of ql than to those of qO, they are moved closer towards the quantized values of 
qO. 

Thus, if d(x i5 Q0) > d(x i? Ql), then Xi is moved towards y i? where y, is the closest point to 
Xi such that d(y i5 Q0) < d(y i? Ql). This is shown in Figure 6 where a is moved to a', whereas p is 
not moved since d(a,Q0) > d(a,Ql) and d(p,Q0) < d(p,Ql). Providing more or less distortion 
shortens more or less the tag respectively. Depending on how much these components are 
moved, this results in the index vector having even more zeros and thus being more 
compressible, thereby resulting in a smaller authentication tag T. In particular, if x, is changed to 
y„ then the resulting index vector includes solely zeros and can be compressed into a single bit 
after removing trailing zeros. 

In the second modification, the components of the feature vector are distorted by moving 
them closer to the nearest quantized values among the quantized values of qO and ql. This is 

YOR9-2000-0410US1 17 



shown in Figure 7 where a is moved to a' and (3 is moved to (3'. This allows the image to 

tolerate more changes before it is deemed not authentic. 

In contrast to Figure 6 which shows the amount of distortion which trades off against the 

size of the tag, whereas Figure 7 shows how to trade off against the impact of minor changes to 
5 the image. Hence, both Figures 6 and 7 are trying to change the image (i.e., add authenticability 

distortion) to produce an authenticatable image, but both trade off different things. That is, in 

Figure 6, as enough distortion is added, then a trade off is made that the tag becomes very small. 

In Figure 7, as enough distortion is added, the authenticatable image can tolerate more changes to 

the image before losing its authenticity. 
10 These modifications only affect the generation of authenticatable data (e.g., see Figure 

3 A). By adding these modifications to Figure 3A, Figure 8 results, which shows a flow diagram 

of the generation of authenticatable data in a preferred embodiment with these modifications. 

That is, Figure 8 is similar to Figure 3 A, but shows the distortion being added. 

In both of these modifications to the preferred embodiment, after the feature vector V is 
1 5 distorted (8 1 0), a new image P is constructed from V (8 1 1 ). The rest of the scheme remains the 

same and the tag is appended (or inserted) into P (812) to form the authenticatable data. It is 

noted that both of these modifications can be applied simultaneously or in different parts of the 

image. It is clear how these modifications can also be adapted to the general system described in 

Figures 2A-2B. 

20 In yet another modification of the above preferred embodiment, more than two 

quantization functions are used. 
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When the image is printed or displayed, the authentication tag can be printed or displayed 
alongside the image in a robust format. For example, the authentication tag which includes a 
series of bits can be printed below the image in a 1 -D or 2-D barcode format or in an OCR 
(optical character recognition)-friendly font, 

5 In some applications, the tag can be printed (or attached) as a magnetic strip or an RFID 

(Radio Frequency Identification) tag alongside the image. To authenticate the printed image, the 
image itself is scanned in while the authentication tag is read-in by either a scanner, a barcode 
reader, a magnetic strip reader, an RFID reader or other appropriate technologies. 

Some image processing operations such as thresholding and removal of minor noise can 

1 0 be applied to the scanned image before authentication. 

The present invention has applications in authenticating printed documents which are 
printed either locally by a trusted printer or remotely. For example, the present invention can be 
adapted to be used in U.S. Patent Application No. 09/398,028, filed on September 17, 1999 to 
Braudaway et al, entitled "METHOD AND SYSTEM FOR REMOTE PRINTING OF 

1 5 DUPLICATION RESISTANT DOCUMENTS" for printing duplication resistant documents. 
The paper medium on which the document is printed is duplication-resistant and contains 
identifying information such as a serial number. The image containing the content of the 
document along with the identifying information on the paper medium form a composite image 
which is then made authenticatable by the present invention. Then, the image containing the 

20 content of the document is printed on the paper medium along with the tag which is printed in a 
machine-readable format such as a barcode. 
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As shown in Figure 9, to authenticate the document 900 (e.g., stock certificate, negotiable 
instrument, etc.), the composite image 901 is scanned in and the authentication tag 902 read in by 
either a scanner or a barcode reader and then authenticated according to the present invention. 
Alignment marks 903 on the document can help in reading the composite image and/or the 
5 authentication tag. Also illustrated in Figure 9 is a serial number of the paper medium 904 and 
the image 905 of content of the document . 

For a color document, color calibration bars can help in scanning the proper colors from 
the document, but is not preferable as this could be a security hole a counterfeiter can take 
advantage of. It is noted that the portion of the composite image 901 where the identifying 
10 information of the paper medium is located has no (or little) authentication distortion since it 
belongs to the paper medium and should not be modifiable. In applications where duplication- 
resistance is not needed, the paper medium does not need to be duplication-resistant and the 
identifying information on the paper medium (e.g., such as the serial number of the paper 
medium) can be omitted. 



15 DIGITAL NOTARIZATION 

In contrast to the case in Figure 9 in which the document is printed at the same time as the 
tag, Figure 10 addresses the case in which the document is printed, handwritten, etc. and then it is 
authenticated such that the tag is printed later. 

Thus, herein below, an application of the present invention is described for notarizing 
20 printed or handwritten original documents 1001 digitally, as shown in Figure 10. In this 
application, a printed or handwritten original document must be made authenticatable. The 
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original document i? produced independent of the process of making it authenticatable. In other 
words, it can be printed using special inks, contains handwritten signatures, etc. Next, the 
document is scanned in as an image and the method of the present invention is applied to 
generate an authentication tag T. 

Then, the tag T is printed onto the document in a robust format such as a barcode, as 
discussed above. The tag is printed in a location which does not obstruct the original document. 
In this "digital notary" application, the authenticatibility distortion must be zero, as the image of 
the original document is not (or cannot be) modified. The method of the present invention 
generates a tag T and prints it onto the paper of the original document. 

There are other applications where the original data cannot be changed and therefore the 
authenticatibility distortion must be zero. For example, images on a CD-R (Recordable 
CD-ROM) cannot be modified, yet authentication tags can be added to the images. 

Another example is in the field of generating authentication tags of duplication-and 
imitation-resistant objects. For example, in U.S. Patent application No. 09/397,503, entitled 
"Method and apparatus for producing duplication- and Imitation-resistant identifying marks on 
objects, and duplication- and imitation-resistant objects" filed on September 17, 1999 to 
Aggarwal et al, an object is produced by, for example, a chemical process resulting in a 
one-of-a-kind object, and this object can be authenticated using the present invention as follows. 

First, the object is read by an appropriate reader resulting in a data set, and an 
authentication tag is generated from this data set. Then, the authentication tag is attached to the 
object. To authenticate the object, it is read by the same type of reader and the resulting data set 
is then authenticated using the authentication tag. It is clear that in this case the authenticability 
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distortion is zero as the object is not modified, and the authentication scheme must tolerate some 
degree of modification as the readers might not read in exactly the same data set from the object. 

Even though the color images discussed have three color components, the present 
invention can be adapted to other color spaces (e.g. 4-color space such as cyan, magenta, yellow, 

5 and black (CMYK)) by one skilled in the art taking the present application as a whole. 

Furthermore, in addition to the indices of the quantization functions, additional information can 
be added to the vector X, such as date, time, name of owner, size of image, etc. 

While the overall methodology of the invention is described above, the invention can be 
embodied in any number of different types of systems and executed in any number of different 

10 ways, as would be known by one ordinarily skilled in the art. 

For example, as illustrated in Figure 1 1 , a typical hardware configuration of an 
information handling/computer system for use with the invention. In accordance with the 
invention, preferably the system has at least one processor or central processing unit (CPU) 1111 
and more preferably several CPUs 1111. The CPUs 1 1 1 1 are interconnected via a system bus 

15 1 1 1 2 to a random access memory (RAM) 1114, read-only memory (ROM) 1116, input/output 
(I/O) adapter 1118 (for connecting peripheral devices such as disk units 1121, barcode reader 
1 150, scanner 1 160 and tape drives 1 140 to the bus 1112), user interface adapter 1 122 (for 
connecting a keyboard 1 124, an input device such as a mouse, trackball, joystick, touch screen, 
etc. 1 126, speaker 1 128, microphone 1 132, and/or other user interface device to the bus 1 1 12), 

20 communication adapter 1 134 (for connecting the information handling system to a data 
processing network such as an intranet, the Internet (World-Wide- Web) etc.), and display 
adapter 1136 (for connecting the bus 1 1 12 to a display device 1 138). The display device could 
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be a cathode ray tube (CRT), liquid crystal display (LCD), etc., as well as a hard-copy printer 
(e.g., such as a digital printer). 

Thus, as shown in Figure 12, in addition to the hardware/software environment described 
above, a different aspect of the invention includes a computer-implemented method for 
5 cryptography-based low distortion, robust data authentication. This method may be implemented 
in the particular environment discussed above. 

Such a method may be implemented, for example, by operating the CPU 1111 (Figure 
1 1), to execute a sequence of machine-readable instructions. These instructions may reside in 
various types of signal-bearing media. 
1 0 Thus, this aspect of the present invention is directed to a programmed product, 

comprising signal-bearing media tangibly embodying a program of machine-readable instructions 
executable by a digital data processor incorporating the CPU 1111 and hardware above, to 
perform the above method. 

This signal-bearing media may include, for example, a RAM (not shown in Figure 12) 
15 contained within the CPU 1 1 1 1 or auxiliary thereto as in RAM 1 1 14, as represented by a fast- 
access storage for example. Alternatively, the instructions may be contained in another signal- 
bearing media, such as a magnetic data storage diskette 1200 (e.g., as shown in Figure 12), 
directly or indirectly accessible by the CPU 1111. 

Whether contained in the diskette 1200, the computer/CPU 1 1 1 1, or elsewhere, the 
20 instructions may be stored on a variety of machine-readable data storage media, such as DASD 
storage (e.g., a conventional "hard drive" or a RAID array), magnetic tape, electronic read-only 
memory (e.g., ROM, EPROM, or EEPROM), an optical storage device (e.g. CD-ROM, WORM, 
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DVD, digital optical tape, etc.), paper "punch" cards, or other suitable signal-bearing media 
including transmission media such as digital and analog and communication links and wireless. 
In an illustrative embodiment of the invention, the machine-readable instructions may comprise 
software object code, compiled from a language such as "C", etc. 

5 Thus, with the unique and unobvious aspects of the present invention, a method (and 

system) are provided in which forged images having the same or similar tags as the original 
image are made difficult to construct while preserving the requirement that minor modifications 
to the image are tolerated. Further, the inventive method and system do not modify the source 
image significantly in order for the image to be authenticatable. Thus, there is not a significant 

1 0 amount of authenticability distortion. 

Those skilled in the art will recognize that the invention can be practiced with 
modification within the spirit and scope of the appended claims. 
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CLAIMS 



We claim: 

1 . A method for generating an output file from a source file where benign modifications to a 
content of the output file still render the output file authentic, comprising: 

5 constructing an index vector from said source file; 

quantizing said source file; 

generating an authentication mark from the quantized source file and said index vector; 
generating an authentication tag by appending the index vector to said authentication 
mark; and 

1 0 generating the output file by appending said authentication tag to said source file. 

2. The method of claim 1, where said appending comprises: 

inserting said authentication tag to said source file by a robust data hiding algorithm. 

3. The method of claim 1, further comprising: 

compressing said index vector. 

15 4. The method of claim 1 , further comprising: 

applying a distortion to said source file, to form a distorted file, 
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wherein tjie generating of the output file is performed by appending said authentication 
tag to said distorted file, 

5. The method of claim 1, further comprising: 

providing a reader for reading the source file. 

5 6. The method of claim 1 , wherein the source file is positioned in a smart card. 

7. The method of claim 1, wherein said authentication mark is obtained by a digital signature 
algorithm. 

8. The method of claim 1, wherein said authentication mark is obtained by a modification 
detection algorithm. 

10 9. The method of claim 1 , wherein said authentication mark is obtained by a message 
authentication algorithm. 

10. The method of claim 1, wherein said source file includes image data. 

11. The method of claim 1, wherein said source file includes video data. 



12. The method of claim 1, wherein said source file includes sound data. 
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13. The method of claim 1 , wherein no distortion is added to the source file to generate the 
output file. 

14, The method of claim 1, wherein said tag is created simultaneously with a creation of said 
source file. 

5 15. The method of claim 1, wherein said authentication tag is created after the source file has 
been created, and is appended to the source file. 

16. A method for generating an output file from a source file where benign modifications to a 
content of the output file still render the output file authentic, comprising: 
constructing an index vector from said source file; 
10 constructing a feature vector of said source file; 

quantizing said feature vector; 

generating an authentication mark from the quantized feature vector and said index 

vector; 

generating an authentication tag by appending the index vector to said authentication 
15 mark; and 

generating the output file by appending said authentication tag to said source file. 



17. The method of claim 16, further comprising: 
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constructing said index vector from said feature vector of said source file. 

18. The method of claim 16, further comprising: 

generating a distorted file from said feature vector, 

wherein the generating of the output file is performed by appending said authentication 
tag to said distorted file. 

19. The method of claim 16, wherein said feature vector comprises discrete cosine transform 
coefficients. 

20. A method for generating an output file from a source file where benign modifications to a 
content of the output file still render the output file authentic, comprising: 

constructing an index vector from said source file; 
quantizing said source file; 
compressing said index vector; 

generating an authentication mark from the quantized source file and said compressed 
index vector; 

generating an authentication tag by appending the index vector to said authentication 
mark; and 

generating the output file by appending said authentication tag to said source file. 
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21 . A methpd for generating an output file from a source file where benign modifications to a 
content of the output file still render the output file authentic, comprising: 

constructing an index vector from said source file; 

quantizing said source file; 
5 compressing said index vector; 

generating an authentication mark from the quantized source file and said index vector; 

generating an authentication tag by appending said compressed index vector to said 
authentication mark; and 

generating the output file by appending said authentication tag to said source file. 

10 22. A method for generating an output file from a source file where benign modifications to a 
content of the output file still render the output file authentic, comprising: 
constructing a feature vector from said source file; 
constructing an index vector from a feature vector of the source file; 
quantizing said feature vector according to the index vector; 
1 5 generating an authentication mark from quantized feature vector and said index vector; 

generating an authentication tag by appending the index vector to said authentication 
mark; and 

generating the output file by appending said authentication tag to said source file. 



20 



23. The method of claim 22, further comprising: 
compressing said index vector. 
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24. A method for authenticating a data file, comprising: 

extracting an authentication tag from said data file; 
extracting an index vector from said authentication tag; 
extracting an authentication mark from said authentication tag; 
5 quantizing said data file; and 

verifying said index vector and said quantized data file with said authentication mark. 

25. The method of claim 24, wherein said index vector comprises a compressed index vector. 

26. The method of claim 25, further comprising: 

decompressing said compressed index vector prior to said quantizing of said data file. 

10 27. The method of claim 24, wherein said authentication mark is obtained by a digital signature 
algorithm. 

28. The method of claim 24, wherein said authentication mark is obtained by a modification 
detection algorithm. 

29. The method of claim 24, wherein said authentication mark is obtained by a message 
1 5 authentication algorithm. 
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30. A method for, authenticating a data file, comprising: 

extracting an authentication tag from said data file; 
extracting an index vector from said authentication tag; 
extracting an authentication mark from said authentication tag; 
5 constructing a feature vector from said data file; 

quantizing said feature vector; and 

verifying said index vector and said quantized feature vector with said authentication 

mark. 

3 1 . A system for generating an output file from a source file where benign modifications to a 
10 content of the output file still render the output file authentic, said system comprising: 

means for constructing an index vector from said source file; 
means for quantizing said source file; 

means for generating an authentication mark from the quantized source file and said 
index vector; 

1 5 means for generating an authentication tag by appending the index vector to said 

authentication mark; and 

means for generating the output file by appending said authentication tag to said source 

file. 



20 



32. A signal-bearing medium tangibly embodying a program of machine-readable instructions 
executable by a digital processing apparatus to perform a method for generating an output file 
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from a source file where benign modifications to a content of the output file still render the 
output file authentic, said method comprising: 

constructing an index vector from said source file; 

quantizing said source file; 

generating an authentication mark from the quantized source file and said index vector; 
generating an authentication tag by appending the index vector to said authentication 
mark; and 

generating the output file by appending said authentication tag to said source file. 
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CRYPTOGRAPHY BASED LOW DISTORTION ROBUST DATA AUTHENTICATION 

SYSTEM AND METHOD THEREFOR 



ABSTRACT OF THE DISCLOSURE 



A method (and system) for generating an output file from a source file where benign 
modifications to a content of the output file still render the output file authentic, includes 
constructing an index vector from the source file, quantizing the source file, generating an 
authentication mark from the quantized source file and the index vector, generating an 
authentication tag by appending the index vector to the authentication mark, and generating the 
output file by appending the authentication tag to the source file. 
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one name is listed below) or an original, first and joint inventor (if plural names are listed below) of the subject matter which is claimed and for 
which a patent is sought on the invention entitled: CRYPTOGRAPHY-BASED LOW DISTORTION ROBUST DATA AUTHENTICATION 
SYSTEM AND METHOD THEREFOR 

the specification of which: 



I hereby state that I have reviewed and understand the contents of the above identified specification, including the claims, as amended by any 
amendment referred to above. 

I acknowledge the duty to disclose information which is material to the patentability of this application in accordance with Title 37, Code of 
Federal Regulations, § L56. 

I hereby claim foreign priority benefits under Title 35, United States Code, § 119 of any foreign application(s) for patent or inventor's certificate 
listed below and have also identified below any foreign application for patent or inventor's certificate having a filing date before that of the 
application on which priority is claimed: 

Prior Foreign Applications): 

Number Country Day/Month/Year Priority Claimed 

I hereby claim the benefit under Title 35, United States Code, § 120 of any United States application(s) listed below and, insofar as the subject 
matter of each of the claims of this application is not disclosed in the prior United States application in the manner provided by the first paragraph 
of Title 35, United States Code, § 112, I acknowledge the duty to disclose material information as defined in Title 37, Code of Federal 
Regulations, § 1.56 which occurred between the filing date of the prior application and the national or PCT international filing date of this 
application: 

Prior U.S. Applications: 

Serial No. Filing Date Status 

I hereby declare that all statements made herein of my own knowledge are true and that all statements made on information and belief are believed 
to be true; and further that these statements were made with the knowledge that willful false statements and the like so made are punishable by 
fine or imprisonment, or both, under Section 1001 of Title 18 of the United States Code and that such willful false statements may jeopardize the 
validity of the application or any patent issued thereon. 

As a named inventor, I hereby appoint the following attorneys and/or agents to prosecute this application and transact all business in the Patent 
and Trademark Office connected therewith: We hereby appoint Manny Schecter, Registration No.3 1 ,722, Christopher A. Hughes, Registration No. 
26,914, Edward A. Pennington, Registration No. 32,588, John E. Hod, Registration No. 26,279, Joseph C. Redmond, Jr., Registration No. 18,753, 
Douglas W. Cameron, Registration No. 31,596, Wayne L. Ellenbogen, Registration No. 43,602, Louis P. Herzberg, Registration No. 41,500, Stephen 
C. Kaufman, Registration No.29,551, Daniel P. Morris, Registration No.32,053, Louis J. Percello, Registration No.33,206, David M. Shofi, 
Registration No. 39,835, Paul J. Otterstedt, Registration No. 37,41 1, Robert M. Trepp, Registration No. 25,933, Lauren Bruzzone, Registration No. 
35,082, and Marian Underweiser, Registration No. 46,134 to prosecute this application and transact all business in the United States Patent and 
Trademark Office connected therewith. 

Send all correspondence to: Sean M. McGinn, McGinn & Gibb, P.C., 1701 Clarendon Boulevard, Suite 100, Arlington, Virginia 22209. 
Customer No. 21254 

Telephone calls should be directed to Sean M. McGinn, McGinn & Gibb, P.C. at (703) 294-6699. 



(1) Inventor: Chai Wah Wu 



(check one) 



a is attached hereto. 
□ was filed on 



, as Application Serial No. 



and was amended on 



Signature: ^^ g — 
Residence: 66 Orchard Drive, Poughquag, NY 12570 




Date: lO ~ t I ~ 2 66Q 



Citizenship: Netherlands 



Post Office Address: Same as Residence 



